A novel corpus of children2s disordered speech

نویسندگان

  • Oscar Saz-Torralba
  • William Ricardo Rodríguez-Dueñas
  • Eduardo Lleida
  • Carlos Vaquero
چکیده

This paper introduces the acquisition, evaluation and baseline Automatic Speech Recognition (ASR) experiments of a novel corpus containing speech from a set of impaired and unimpaired young speakers. A group of 14 speakers with different speech disorders have uttered several sessions over a 57-word vocabulary in Spanish to gather more than 3 hours of speech. In addition to this work, a parallel corpus of speech from unimpaired young speakers has been recorded with more than 6 hours of speech with the same vocabulary. The impaired speech corpus has been evaluated through a manual labeling to detect the mispronunciations made by the speakers, and the outcome of this work show that 17.31% of the phonemes have been either mispronounced or deleted in an isolated work task. A baseline evaluation of the performance of an state-of-the-art ASR system shows a 35.02% of Word Error Rate (WER) when using Speaker Independent models based on adult speech. This WER is reduced to 27.60% using models based on children speech and further reduced to 15.35% using speaker dependent models. Finally, experiments on connected speech show how ASR performance degrades on 4 impaired speakers on the transition from isolated words to connected speech due to the language impairments of the speakers and the coarticulation in connected speech.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recent advances in sonic Italian children2s speech recognition for interactive literacy tutors

Recent advances in SONIC Italian children’s speech recognition will be described. This work, completing a previous one developed in the past, was conducted with the specific goals of integrating the newly trained children’s speech recognition models into the Italian version of the Colorado Literacy Tutor platform. Specifically, children’s speech recognition research for Italian was conducted us...

متن کامل

The Effect of Colligational Corpus-based Instruction on Enhancing the Pragmalinguistic Knowledge of Request Speech Act among Iranian Intermediate EFL Learners

This study investigated the effectiveness of colligational corpus-based instruction on enhancing the pragmalinguistic knowledge of speech act of request among Iranian intermediate EFL learners. The objective of the study was to find out whether or not providing students with corpora through using colligational instruction had any significant effects on enhancing their pragmalinguistic knowledge...

متن کامل

The Effect of Colligational Corpus-based Instruction on Enhancing the Pragmalinguistic Knowledge of Request Speech Act among Iranian Intermediate EFL Learners

This study investigated the effectiveness of colligational corpus-based instruction on enhancing the pragmalinguistic knowledge of speech act of request among Iranian intermediate EFL learners. The objective of the study was to find out whether or not providing students with corpora through using colligational instruction had any significant effects on enhancing their pragmalinguistic knowledge...

متن کامل

A Novel Corpus of Children’s Disordered Speech

This paper introduces the acquisition, evaluation and baseline Automatic Speech Recognition (ASR) experiments of a novel corpus containing speech from a set of impaired and unimpaired young speakers. A group of 14 speakers with different speech disorders have uttered several sessions over a 57-word vocabulary in Spanish to gather more than 3 hours of speech. In addition to this work, a parallel...

متن کامل

Improving Child Speech Disorder Assessment by Incorporating Out-of-Domain Adult Speech

This paper describes the continued development of a system to provide early assessment of speech development issues in children and better triaging to professional services. Whilst corpora of children’s speech are increasingly available, recognition of disordered children’s speech is still a data-scarce task. Transfer learning methods have been shown to be effective at leveraging out-of-domain ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008